Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 192 |
| Missing cells (%) | 1.6% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 74.3 KiB |
| Average record size in memory | 76.1 B |
Variable types
| Numeric | 7 |
|---|---|
| Categorical | 5 |
title has a high cardinality: 999 distinct values | High cardinality |
genre has a high cardinality: 207 distinct values | High cardinality |
description has a high cardinality: 1000 distinct values | High cardinality |
director has a high cardinality: 644 distinct values | High cardinality |
actors has a high cardinality: 996 distinct values | High cardinality |
revenue (millions) has 128 (12.8%) missing values | Missing |
metascore has 64 (6.4%) missing values | Missing |
rank is uniformly distributed | Uniform |
title is uniformly distributed | Uniform |
description is uniformly distributed | Uniform |
director is uniformly distributed | Uniform |
actors is uniformly distributed | Uniform |
rank has unique values | Unique |
description has unique values | Unique |
Reproduction
| Analysis started | 2021-01-30 16:31:40.415377 |
|---|---|
| Analysis finished | 2021-01-30 16:31:54.978053 |
| Duration | 14.56 seconds |
| Software version | pandas-profiling v2.10.0 |
| Download configuration | config.yaml |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 500.5 |
|---|---|
| Minimum | 1 |
| Maximum | 1000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 50.95 |
| Q1 | 250.75 |
| median | 500.5 |
| Q3 | 750.25 |
| 95-th percentile | 950.05 |
| Maximum | 1000 |
| Range | 999 |
| Interquartile range (IQR) | 499.5 |
Descriptive statistics
| Standard deviation | 288.8194361 |
|---|---|
| Coefficient of variation (CV) | 0.5770618104 |
| Kurtosis | -1.2 |
| Mean | 500.5 |
| Median Absolute Deviation (MAD) | 250 |
| Skewness | 0 |
| Sum | 500500 |
| Variance | 83416.66667 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1000 | 1 | 0.1% |
| 329 | 1 | 0.1% |
| 342 | 1 | 0.1% |
| 341 | 1 | 0.1% |
| 340 | 1 | 0.1% |
| 339 | 1 | 0.1% |
| 338 | 1 | 0.1% |
| 337 | 1 | 0.1% |
| 336 | 1 | 0.1% |
| 335 | 1 | 0.1% |
| Other values (990) | 990 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 |
| Value | Count | Frequency (%) |
| 1000 | 1 | |
| 999 | 1 | |
| 998 | 1 | |
| 997 | 1 | |
| 996 | 1 |
| Distinct | 999 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 KiB |
| The Host | 2 |
|---|---|
| The Invitation | 1 |
| La tortue rouge | 1 |
| Big Hero 6 | 1 |
| Relatos salvajes | 1 |
| Other values (994) |
Length
| Max length | 61 |
|---|---|
| Median length | 13 |
| Mean length | 14.539 |
| Min length | 2 |
Characters and Unicode
| Total characters | 14539 |
|---|---|
| Distinct characters | 81 |
| Distinct categories | 8 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 998 ? |
|---|---|
| Unique (%) | 99.8% |
Sample
| 1st row | Guardians of the Galaxy |
|---|---|
| 2nd row | Prometheus |
| 3rd row | Split |
| 4th row | Sing |
| 5th row | Suicide Squad |
| Value | Count | Frequency (%) |
| The Host | 2 | 0.2% |
| The Invitation | 1 | 0.1% |
| La tortue rouge | 1 | 0.1% |
| Big Hero 6 | 1 | 0.1% |
| Relatos salvajes | 1 | 0.1% |
| True Crimes | 1 | 0.1% |
| Steve Jobs | 1 | 0.1% |
| It Follows | 1 | 0.1% |
| Superbad | 1 | 0.1% |
| Legend | 1 | 0.1% |
| Other values (989) | 989 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 305 | 11.7% |
| of | 92 | 3.5% |
| a | 29 | 1.1% |
| and | 22 | 0.8% |
| 2 | 22 | 0.8% |
| in | 22 | 0.8% |
| 15 | 0.6% | |
| to | 12 | 0.5% |
| man | 12 | 0.5% |
| me | 11 | 0.4% |
| Other values (1429) | 2063 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1605 | 11.0% | |
| e | 1507 | 10.4% |
| a | 884 | 6.1% |
| o | 851 | 5.9% |
| n | 828 | 5.7% |
| r | 799 | 5.5% |
| i | 775 | 5.3% |
| t | 720 | 5.0% |
| s | 609 | 4.2% |
| h | 539 | 3.7% |
| Other values (71) | 5422 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 10340 | |
| Uppercase Letter | 2274 | 15.6% |
| Space Separator | 1605 | 11.0% |
| Other Punctuation | 171 | 1.2% |
| Decimal Number | 110 | 0.8% |
| Dash Punctuation | 31 | 0.2% |
| Open Punctuation | 4 | < 0.1% |
| Close Punctuation | 4 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 1507 | |
| a | 884 | 8.5% |
| o | 851 | 8.2% |
| n | 828 | 8.0% |
| r | 799 | 7.7% |
| i | 775 | 7.5% |
| t | 720 | 7.0% |
| s | 609 | 5.9% |
| h | 539 | 5.2% |
| l | 457 | 4.4% |
| Other values (22) | 2371 |
| Value | Count | Frequency (%) |
| T | 350 | |
| S | 188 | 8.3% |
| M | 141 | 6.2% |
| B | 133 | 5.8% |
| D | 125 | 5.5% |
| A | 115 | 5.1% |
| P | 110 | 4.8% |
| H | 105 | 4.6% |
| C | 104 | 4.6% |
| W | 100 | 4.4% |
| Other values (16) | 803 |
| Value | Count | Frequency (%) |
| 2 | 35 | |
| 3 | 17 | |
| 1 | 15 | |
| 0 | 15 | |
| 5 | 7 | 6.4% |
| 4 | 7 | 6.4% |
| 7 | 5 | 4.5% |
| 6 | 3 | 2.7% |
| 9 | 3 | 2.7% |
| 8 | 3 | 2.7% |
| Value | Count | Frequency (%) |
| : | 85 | |
| ' | 39 | |
| . | 23 | 13.5% |
| , | 9 | 5.3% |
| & | 6 | 3.5% |
| ! | 4 | 2.3% |
| ? | 2 | 1.2% |
| / | 2 | 1.2% |
| · | 1 | 0.6% |
| Value | Count | Frequency (%) |
| 1605 |
| Value | Count | Frequency (%) |
| - | 31 |
| Value | Count | Frequency (%) |
| ( | 4 |
| Value | Count | Frequency (%) |
| ) | 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 12614 | |
| Common | 1925 | 13.2% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 1507 | 11.9% |
| a | 884 | 7.0% |
| o | 851 | 6.7% |
| n | 828 | 6.6% |
| r | 799 | 6.3% |
| i | 775 | 6.1% |
| t | 720 | 5.7% |
| s | 609 | 4.8% |
| h | 539 | 4.3% |
| l | 457 | 3.6% |
| Other values (48) | 4645 |
| Value | Count | Frequency (%) |
| 1605 | ||
| : | 85 | 4.4% |
| ' | 39 | 2.0% |
| 2 | 35 | 1.8% |
| - | 31 | 1.6% |
| . | 23 | 1.2% |
| 3 | 17 | 0.9% |
| 1 | 15 | 0.8% |
| 0 | 15 | 0.8% |
| , | 9 | 0.5% |
| Other values (13) | 51 | 2.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14530 | |
| None | 9 | 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 1605 | 11.0% | |
| e | 1507 | 10.4% |
| a | 884 | 6.1% |
| o | 851 | 5.9% |
| n | 828 | 5.7% |
| r | 799 | 5.5% |
| i | 775 | 5.3% |
| t | 720 | 5.0% |
| s | 609 | 4.2% |
| h | 539 | 3.7% |
| Other values (64) | 5413 |
| Value | Count | Frequency (%) |
| é | 3 | |
| è | 1 | 11.1% |
| ä | 1 | 11.1% |
| · | 1 | 11.1% |
| í | 1 | 11.1% |
| á | 1 | 11.1% |
| ç | 1 | 11.1% |
| Distinct | 207 |
|---|---|
| Distinct (%) | 20.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 KiB |
| Action,Adventure,Sci-Fi | 50 |
|---|---|
| Drama | 48 |
| Comedy,Drama,Romance | 35 |
| Comedy | 32 |
| Drama,Romance | 31 |
| Other values (202) |
Length
| Max length | 26 |
|---|---|
| Median length | 20 |
| Mean length | 18.095 |
| Min length | 5 |
Characters and Unicode
| Total characters | 18095 |
|---|---|
| Distinct characters | 31 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 85 ? |
|---|---|
| Unique (%) | 8.5% |
Sample
| 1st row | Action,Adventure,Sci-Fi |
|---|---|
| 2nd row | Adventure,Mystery,Sci-Fi |
| 3rd row | Horror,Thriller |
| 4th row | Animation,Comedy,Family |
| 5th row | Action,Adventure,Fantasy |
| Value | Count | Frequency (%) |
| Action,Adventure,Sci-Fi | 50 | 5.0% |
| Drama | 48 | 4.8% |
| Comedy,Drama,Romance | 35 | 3.5% |
| Comedy | 32 | 3.2% |
| Drama,Romance | 31 | 3.1% |
| Animation,Adventure,Comedy | 27 | 2.7% |
| Comedy,Drama | 27 | 2.7% |
| Action,Adventure,Fantasy | 27 | 2.7% |
| Comedy,Romance | 26 | 2.6% |
| Crime,Drama,Thriller | 24 | 2.4% |
| Other values (197) | 673 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| action,adventure,sci-fi | 50 | 5.0% |
| drama | 48 | 4.8% |
| comedy,drama,romance | 35 | 3.5% |
| comedy | 32 | 3.2% |
| drama,romance | 31 | 3.1% |
| animation,adventure,comedy | 27 | 2.7% |
| action,adventure,fantasy | 27 | 2.7% |
| comedy,drama | 27 | 2.7% |
| comedy,romance | 26 | 2.6% |
| crime,drama,thriller | 24 | 2.4% |
| Other values (197) | 673 |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 1923 | 10.6% |
| a | 1568 | 8.7% |
| , | 1555 | 8.6% |
| e | 1403 | 7.8% |
| m | 1183 | 6.5% |
| i | 1168 | 6.5% |
| o | 1138 | 6.3% |
| n | 909 | 5.0% |
| t | 872 | 4.8% |
| y | 753 | 4.2% |
| Other values (21) | 5623 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 13745 | |
| Uppercase Letter | 2675 | 14.8% |
| Other Punctuation | 1555 | 8.6% |
| Dash Punctuation | 120 | 0.7% |
Most frequent character per category
| Value | Count | Frequency (%) |
| r | 1923 | |
| a | 1568 | |
| e | 1403 | |
| m | 1183 | |
| i | 1168 | |
| o | 1138 | |
| n | 909 | |
| t | 872 | 6.3% |
| y | 753 | 5.5% |
| c | 585 | 4.3% |
| Other values (8) | 2243 |
| Value | Count | Frequency (%) |
| A | 611 | |
| D | 513 | |
| C | 429 | |
| F | 272 | |
| T | 195 | 7.3% |
| H | 148 | 5.5% |
| R | 141 | 5.3% |
| S | 138 | 5.2% |
| M | 127 | 4.7% |
| B | 81 | 3.0% |
| Value | Count | Frequency (%) |
| , | 1555 |
| Value | Count | Frequency (%) |
| - | 120 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16420 | |
| Common | 1675 | 9.3% |
Most frequent character per script
| Value | Count | Frequency (%) |
| r | 1923 | 11.7% |
| a | 1568 | 9.5% |
| e | 1403 | 8.5% |
| m | 1183 | 7.2% |
| i | 1168 | 7.1% |
| o | 1138 | 6.9% |
| n | 909 | 5.5% |
| t | 872 | 5.3% |
| y | 753 | 4.6% |
| A | 611 | 3.7% |
| Other values (19) | 4892 |
| Value | Count | Frequency (%) |
| , | 1555 | |
| - | 120 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 18095 |
Most frequent character per block
| Value | Count | Frequency (%) |
| r | 1923 | 10.6% |
| a | 1568 | 8.7% |
| , | 1555 | 8.6% |
| e | 1403 | 7.8% |
| m | 1183 | 6.5% |
| i | 1168 | 6.5% |
| o | 1138 | 6.3% |
| n | 909 | 5.0% |
| t | 872 | 4.8% |
| y | 753 | 4.2% |
| Other values (21) | 5623 |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 KiB |
| A tight-knit team of rising investigators, along with their supervisor, is suddenly torn apart when they discover that one of their own teenage daughters has been brutally murdered. | 1 |
|---|---|
| Three buddies wake up from a bachelor party in Las Vegas, with no memory of the previous night and the bachelor missing. They make their way around the city in order to find their friend before his wedding. | 1 |
| In 1984 East Berlin, an agent of the secret police, conducting surveillance on a writer and his lover, finds himself becoming increasingly absorbed by their lives. | 1 |
| After an experimental bio-weapon is released, turning thousands into zombie-like creatures, it's up to a rag-tag group of survivors to stop the infected and those behind its release. | 1 |
| Brian O'Conner, back working for the FBI in Los Angeles, teams up with Dominic Toretto to bring down a heroin importer by infiltrating his operation. | 1 |
| Other values (995) |
Length
| Max length | 421 |
|---|---|
| Median length | 159 |
| Mean length | 163.232 |
| Min length | 42 |
Characters and Unicode
| Total characters | 163232 |
|---|---|
| Distinct characters | 82 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe. |
|---|---|
| 2nd row | Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone. |
| 3rd row | Three girls are kidnapped by a man with a diagnosed 23 distinct personalities. They must try to escape before the apparent emergence of a frightful new 24th. |
| 4th row | In a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists' find that their lives will never be the same. |
| 5th row | A secret government agency recruits some of the most dangerous incarcerated super-villains to form a defensive task force. Their first mission: save the world from the apocalypse. |
| Value | Count | Frequency (%) |
| A tight-knit team of rising investigators, along with their supervisor, is suddenly torn apart when they discover that one of their own teenage daughters has been brutally murdered. | 1 | 0.1% |
| Three buddies wake up from a bachelor party in Las Vegas, with no memory of the previous night and the bachelor missing. They make their way around the city in order to find their friend before his wedding. | 1 | 0.1% |
| In 1984 East Berlin, an agent of the secret police, conducting surveillance on a writer and his lover, finds himself becoming increasingly absorbed by their lives. | 1 | 0.1% |
| After an experimental bio-weapon is released, turning thousands into zombie-like creatures, it's up to a rag-tag group of survivors to stop the infected and those behind its release. | 1 | 0.1% |
| Brian O'Conner, back working for the FBI in Los Angeles, teams up with Dominic Toretto to bring down a heroin importer by infiltrating his operation. | 1 | 0.1% |
| A head chef quits his restaurant job and buys a food truck in an effort to reclaim his creative promise, while piecing back together his estranged family. | 1 | 0.1% |
| Newlywed couple Ted and Tami-Lynn want to have a baby, but in order to qualify to be a parent, Ted will have to prove he's a person in a court of law. | 1 | 0.1% |
| In an emotionless utopia, two people fall in love when they regain their feelings from a mysterious disease, causing tensions between them and their society. | 1 | 0.1% |
| When a member of a popular New York City improv troupe gets a huge break, the rest of the group - all best friends - start to realize that not everyone is going to make it after all. | 1 | 0.1% |
| A grief-stricken mother takes on the LAPD to her own detriment when it stubbornly tries to pass off an obvious impostor as her missing child, while also refusing to give up hope that she will find him one day. | 1 | 0.1% |
| Other values (990) | 990 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| a | 1626 | 5.8% |
| the | 1360 | 4.9% |
| to | 934 | 3.3% |
| of | 807 | 2.9% |
| and | 716 | 2.6% |
| in | 578 | 2.1% |
| his | 487 | 1.7% |
| an | 304 | 1.1% |
| is | 296 | 1.1% |
| with | 274 | 1.0% |
| Other values (6172) | 20539 |
Most occurring characters
| Value | Count | Frequency (%) |
| 26921 | ||
| e | 15840 | 9.7% |
| t | 10926 | 6.7% |
| a | 10686 | 6.5% |
| i | 9657 | 5.9% |
| o | 9618 | 5.9% |
| n | 9602 | 5.9% |
| r | 9227 | 5.7% |
| s | 8727 | 5.3% |
| h | 6513 | 4.0% |
| Other values (72) | 45515 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 128516 | |
| Space Separator | 26921 | 16.5% |
| Uppercase Letter | 3786 | 2.3% |
| Other Punctuation | 2995 | 1.8% |
| Decimal Number | 506 | 0.3% |
| Dash Punctuation | 438 | 0.3% |
| Open Punctuation | 24 | < 0.1% |
| Close Punctuation | 24 | < 0.1% |
| Final Punctuation | 20 | < 0.1% |
| Currency Symbol | 2 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 15840 | |
| t | 10926 | 8.5% |
| a | 10686 | 8.3% |
| i | 9657 | 7.5% |
| o | 9618 | 7.5% |
| n | 9602 | 7.5% |
| r | 9227 | 7.2% |
| s | 8727 | 6.8% |
| h | 6513 | 5.1% |
| l | 5169 | 4.0% |
| Other values (20) | 32551 |
| Value | Count | Frequency (%) |
| A | 688 | |
| T | 290 | 7.7% |
| S | 271 | 7.2% |
| B | 227 | 6.0% |
| W | 211 | 5.6% |
| C | 204 | 5.4% |
| I | 201 | 5.3% |
| M | 192 | 5.1% |
| F | 142 | 3.8% |
| H | 140 | 3.7% |
| Other values (16) | 1220 |
| Value | Count | Frequency (%) |
| . | 1365 | |
| , | 1216 | |
| ' | 297 | 9.9% |
| " | 66 | 2.2% |
| : | 26 | 0.9% |
| ? | 11 | 0.4% |
| ; | 8 | 0.3% |
| / | 4 | 0.1% |
| ! | 1 | < 0.1% |
| # | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 110 | |
| 0 | 108 | |
| 9 | 92 | |
| 2 | 53 | |
| 7 | 31 | 6.1% |
| 8 | 28 | 5.5% |
| 6 | 25 | 4.9% |
| 4 | 23 | 4.5% |
| 5 | 23 | 4.5% |
| 3 | 13 | 2.6% |
| Value | Count | Frequency (%) |
| 26921 |
| Value | Count | Frequency (%) |
| - | 438 |
| Value | Count | Frequency (%) |
| ( | 24 |
| Value | Count | Frequency (%) |
| ) | 24 |
| Value | Count | Frequency (%) |
| » | 20 |
| Value | Count | Frequency (%) |
| $ | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 132302 | |
| Common | 30930 | 18.9% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 15840 | |
| t | 10926 | 8.3% |
| a | 10686 | 8.1% |
| i | 9657 | 7.3% |
| o | 9618 | 7.3% |
| n | 9602 | 7.3% |
| r | 9227 | 7.0% |
| s | 8727 | 6.6% |
| h | 6513 | 4.9% |
| l | 5169 | 3.9% |
| Other values (46) | 36337 |
| Value | Count | Frequency (%) |
| 26921 | ||
| . | 1365 | 4.4% |
| , | 1216 | 3.9% |
| - | 438 | 1.4% |
| ' | 297 | 1.0% |
| 1 | 110 | 0.4% |
| 0 | 108 | 0.3% |
| 9 | 92 | 0.3% |
| " | 66 | 0.2% |
| 2 | 53 | 0.2% |
| Other values (16) | 264 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 163203 | |
| None | 29 | < 0.1% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 26921 | ||
| e | 15840 | 9.7% |
| t | 10926 | 6.7% |
| a | 10686 | 6.5% |
| i | 9657 | 5.9% |
| o | 9618 | 5.9% |
| n | 9602 | 5.9% |
| r | 9227 | 5.7% |
| s | 8727 | 5.3% |
| h | 6513 | 4.0% |
| Other values (67) | 45486 |
| Value | Count | Frequency (%) |
| » | 20 | |
| é | 4 | 13.8% |
| á | 2 | 6.9% |
| è | 2 | 6.9% |
| í | 1 | 3.4% |
| Distinct | 644 |
|---|---|
| Distinct (%) | 64.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 KiB |
| Ridley Scott | 8 |
|---|---|
| David Yates | 6 |
| Michael Bay | 6 |
| Paul W.S. Anderson | 6 |
| M. Night Shyamalan | 6 |
| Other values (639) |
Length
| Max length | 32 |
|---|---|
| Median length | 13 |
| Mean length | 13.139 |
| Min length | 3 |
Characters and Unicode
| Total characters | 13139 |
|---|---|
| Distinct characters | 69 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 444 ? |
|---|---|
| Unique (%) | 44.4% |
Sample
| 1st row | James Gunn |
|---|---|
| 2nd row | Ridley Scott |
| 3rd row | M. Night Shyamalan |
| 4th row | Christophe Lourdelet |
| 5th row | David Ayer |
| Value | Count | Frequency (%) |
| Ridley Scott | 8 | 0.8% |
| David Yates | 6 | 0.6% |
| Michael Bay | 6 | 0.6% |
| Paul W.S. Anderson | 6 | 0.6% |
| M. Night Shyamalan | 6 | 0.6% |
| Antoine Fuqua | 5 | 0.5% |
| Denis Villeneuve | 5 | 0.5% |
| Danny Boyle | 5 | 0.5% |
| Martin Scorsese | 5 | 0.5% |
| Zack Snyder | 5 | 0.5% |
| Other values (634) | 943 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| david | 38 | 1.8% |
| john | 25 | 1.2% |
| michael | 22 | 1.1% |
| james | 21 | 1.0% |
| scott | 20 | 1.0% |
| paul | 19 | 0.9% |
| robert | 14 | 0.7% |
| steven | 13 | 0.6% |
| lee | 12 | 0.6% |
| peter | 12 | 0.6% |
| Other values (977) | 1896 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1223 | 9.3% |
| 1092 | 8.3% | |
| a | 1056 | 8.0% |
| n | 937 | 7.1% |
| r | 875 | 6.7% |
| o | 783 | 6.0% |
| i | 740 | 5.6% |
| l | 604 | 4.6% |
| t | 486 | 3.7% |
| s | 467 | 3.6% |
| Other values (59) | 4876 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9802 | |
| Uppercase Letter | 2153 | 16.4% |
| Space Separator | 1092 | 8.3% |
| Other Punctuation | 73 | 0.6% |
| Dash Punctuation | 19 | 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 1223 | |
| a | 1056 | |
| n | 937 | |
| r | 875 | 8.9% |
| o | 783 | 8.0% |
| i | 740 | 7.5% |
| l | 604 | 6.2% |
| t | 486 | 5.0% |
| s | 467 | 4.8% |
| h | 357 | 3.6% |
| Other values (28) | 2274 |
| Value | Count | Frequency (%) |
| S | 207 | 9.6% |
| J | 200 | 9.3% |
| M | 183 | 8.5% |
| A | 148 | 6.9% |
| D | 137 | 6.4% |
| G | 131 | 6.1% |
| B | 127 | 5.9% |
| C | 123 | 5.7% |
| R | 119 | 5.5% |
| L | 108 | 5.0% |
| Other values (17) | 670 |
| Value | Count | Frequency (%) |
| . | 71 | |
| ' | 2 | 2.7% |
| Value | Count | Frequency (%) |
| 1092 |
| Value | Count | Frequency (%) |
| - | 19 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11955 | |
| Common | 1184 | 9.0% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 1223 | 10.2% |
| a | 1056 | 8.8% |
| n | 937 | 7.8% |
| r | 875 | 7.3% |
| o | 783 | 6.5% |
| i | 740 | 6.2% |
| l | 604 | 5.1% |
| t | 486 | 4.1% |
| s | 467 | 3.9% |
| h | 357 | 3.0% |
| Other values (55) | 4427 |
| Value | Count | Frequency (%) |
| 1092 | ||
| . | 71 | 6.0% |
| - | 19 | 1.6% |
| ' | 2 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13095 | |
| None | 44 | 0.3% |
Most frequent character per block
| Value | Count | Frequency (%) |
| e | 1223 | 9.3% |
| 1092 | 8.3% | |
| a | 1056 | 8.1% |
| n | 937 | 7.2% |
| r | 875 | 6.7% |
| o | 783 | 6.0% |
| i | 740 | 5.7% |
| l | 604 | 4.6% |
| t | 486 | 3.7% |
| s | 467 | 3.6% |
| Other values (46) | 4832 |
| Value | Count | Frequency (%) |
| é | 11 | |
| á | 9 | |
| ó | 4 | 9.1% |
| ö | 4 | 9.1% |
| å | 4 | 9.1% |
| ñ | 3 | 6.8% |
| ç | 3 | 6.8% |
| Ø | 1 | 2.3% |
| í | 1 | 2.3% |
| ë | 1 | 2.3% |
| Other values (3) | 3 | 6.8% |
| Distinct | 996 |
|---|---|
| Distinct (%) | 99.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 KiB |
| Gerard Butler, Aaron Eckhart, Morgan Freeman,Angela Bassett | 2 |
|---|---|
| Shia LaBeouf, Megan Fox, Josh Duhamel, Tyrese Gibson | 2 |
| Daniel Radcliffe, Emma Watson, Rupert Grint, Michael Gambon | 2 |
| Jennifer Lawrence, Josh Hutcherson, Liam Hemsworth, Woody Harrelson | 2 |
| Shia LaBeouf, David Morse, Carrie-Anne Moss, Sarah Roemer | 1 |
| Other values (991) |
Length
| Max length | 77 |
|---|---|
| Median length | 58 |
| Mean length | 58.288 |
| Min length | 43 |
Characters and Unicode
| Total characters | 58288 |
|---|---|
| Distinct characters | 79 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 992 ? |
|---|---|
| Unique (%) | 99.2% |
Sample
| 1st row | Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana |
|---|---|
| 2nd row | Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron |
| 3rd row | James McAvoy, Anya Taylor-Joy, Haley Lu Richardson, Jessica Sula |
| 4th row | Matthew McConaughey,Reese Witherspoon, Seth MacFarlane, Scarlett Johansson |
| 5th row | Will Smith, Jared Leto, Margot Robbie, Viola Davis |
| Value | Count | Frequency (%) |
| Gerard Butler, Aaron Eckhart, Morgan Freeman,Angela Bassett | 2 | 0.2% |
| Shia LaBeouf, Megan Fox, Josh Duhamel, Tyrese Gibson | 2 | 0.2% |
| Daniel Radcliffe, Emma Watson, Rupert Grint, Michael Gambon | 2 | 0.2% |
| Jennifer Lawrence, Josh Hutcherson, Liam Hemsworth, Woody Harrelson | 2 | 0.2% |
| Shia LaBeouf, David Morse, Carrie-Anne Moss, Sarah Roemer | 1 | 0.1% |
| Shailene Woodley, Ansel Elgort, Nat Wolff, Laura Dern | 1 | 0.1% |
| Adam Sandler, Jennifer Aniston, Brooklyn Decker,Nicole Kidman | 1 | 0.1% |
| Willem Dafoe, Charlotte Gainsbourg, Storm Acheche Sahlstrøm | 1 | 0.1% |
| Steve Carell, Ryan Gosling, Julianne Moore, Emma Stone | 1 | 0.1% |
| Tom Hardy, Noomi Rapace, James Gandolfini,Matthias Schoenaerts | 1 | 0.1% |
| Other values (986) | 986 |
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| michael | 62 | 0.8% |
| james | 50 | 0.6% |
| tom | 44 | 0.6% |
| chris | 42 | 0.5% |
| john | 42 | 0.5% |
| jason | 41 | 0.5% |
| robert | 37 | 0.5% |
| mark | 35 | 0.4% |
| jennifer | 35 | 0.4% |
| ben | 31 | 0.4% |
| Other values (2924) | 7441 |
Most occurring characters
| Value | Count | Frequency (%) |
| 6860 | 11.8% | |
| e | 5007 | 8.6% |
| a | 4812 | 8.3% |
| n | 3867 | 6.6% |
| i | 3216 | 5.5% |
| r | 3171 | 5.4% |
| , | 2999 | 5.1% |
| o | 2949 | 5.1% |
| l | 2811 | 4.8% |
| s | 1931 | 3.3% |
| Other values (69) | 20665 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 39800 | |
| Uppercase Letter | 8428 | 14.5% |
| Space Separator | 6860 | 11.8% |
| Other Punctuation | 3117 | 5.3% |
| Dash Punctuation | 81 | 0.1% |
| Decimal Number | 2 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| e | 5007 | |
| a | 4812 | |
| n | 3867 | |
| i | 3216 | 8.1% |
| r | 3171 | 8.0% |
| o | 2949 | 7.4% |
| l | 2811 | 7.1% |
| s | 1931 | 4.9% |
| t | 1924 | 4.8% |
| h | 1557 | 3.9% |
| Other values (33) | 8555 |
| Value | Count | Frequency (%) |
| J | 749 | 8.9% |
| M | 725 | 8.6% |
| C | 661 | 7.8% |
| S | 632 | 7.5% |
| B | 618 | 7.3% |
| A | 520 | 6.2% |
| R | 507 | 6.0% |
| D | 473 | 5.6% |
| L | 389 | 4.6% |
| H | 385 | 4.6% |
| Other values (19) | 2769 |
| Value | Count | Frequency (%) |
| , | 2999 | |
| . | 91 | 2.9% |
| ' | 27 | 0.9% |
| Value | Count | Frequency (%) |
| 5 | 1 | |
| 0 | 1 |
| Value | Count | Frequency (%) |
| 6860 |
| Value | Count | Frequency (%) |
| - | 81 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 48228 | |
| Common | 10060 | 17.3% |
Most frequent character per script
| Value | Count | Frequency (%) |
| e | 5007 | 10.4% |
| a | 4812 | 10.0% |
| n | 3867 | 8.0% |
| i | 3216 | 6.7% |
| r | 3171 | 6.6% |
| o | 2949 | 6.1% |
| l | 2811 | 5.8% |
| s | 1931 | 4.0% |
| t | 1924 | 4.0% |
| h | 1557 | 3.2% |
| Other values (62) | 16983 |
| Value | Count | Frequency (%) |
| 6860 | ||
| , | 2999 | |
| . | 91 | 0.9% |
| - | 81 | 0.8% |
| ' | 27 | 0.3% |
| 5 | 1 | < 0.1% |
| 0 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 58177 | |
| None | 111 | 0.2% |
Most frequent character per block
| Value | Count | Frequency (%) |
| 6860 | 11.8% | |
| e | 5007 | 8.6% |
| a | 4812 | 8.3% |
| n | 3867 | 6.6% |
| i | 3216 | 5.5% |
| r | 3171 | 5.5% |
| , | 2999 | 5.2% |
| o | 2949 | 5.1% |
| l | 2811 | 4.8% |
| s | 1931 | 3.3% |
| Other values (49) | 20554 |
| Value | Count | Frequency (%) |
| é | 29 | |
| ë | 16 | |
| á | 12 | |
| í | 10 | 9.0% |
| å | 10 | 9.0% |
| ü | 6 | 5.4% |
| ñ | 5 | 4.5% |
| è | 4 | 3.6% |
| Ó | 3 | 2.7% |
| ô | 2 | 1.8% |
| Other values (10) | 14 |
year
Real number (ℝ≥0)
| Distinct | 11 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2012.783 |
|---|---|
| Minimum | 2006 |
| Maximum | 2016 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 2006 |
|---|---|
| 5-th percentile | 2007 |
| Q1 | 2010 |
| median | 2014 |
| Q3 | 2016 |
| 95-th percentile | 2016 |
| Maximum | 2016 |
| Range | 10 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.205961508 |
|---|---|
| Coefficient of variation (CV) | 0.00159280037 |
| Kurtosis | -0.8219639755 |
| Mean | 2012.783 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.6898787091 |
| Sum | 2012783 |
| Variance | 10.27818919 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=11)
| Value | Count | Frequency (%) |
| 2016 | 297 | |
| 2015 | 127 | |
| 2014 | 98 | 9.8% |
| 2013 | 91 | 9.1% |
| 2012 | 64 | 6.4% |
| 2011 | 63 | 6.3% |
| 2010 | 60 | 6.0% |
| 2007 | 53 | 5.3% |
| 2008 | 52 | 5.2% |
| 2009 | 51 | 5.1% |
| Value | Count | Frequency (%) |
| 2006 | 44 | |
| 2007 | 53 | |
| 2008 | 52 | |
| 2009 | 51 | |
| 2010 | 60 |
| Value | Count | Frequency (%) |
| 2016 | 297 | |
| 2015 | 127 | |
| 2014 | 98 | 9.8% |
| 2013 | 91 | 9.1% |
| 2012 | 64 | 6.4% |
runtime (minutes)
Real number (ℝ≥0)
| Distinct | 94 |
|---|---|
| Distinct (%) | 9.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 113.172 |
|---|---|
| Minimum | 66 |
| Maximum | 191 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 66 |
|---|---|
| 5-th percentile | 88 |
| Q1 | 100 |
| median | 111 |
| Q3 | 123 |
| 95-th percentile | 150 |
| Maximum | 191 |
| Range | 125 |
| Interquartile range (IQR) | 23 |
Descriptive statistics
| Standard deviation | 18.81090817 |
|---|---|
| Coefficient of variation (CV) | 0.1662152138 |
| Kurtosis | 0.8583211032 |
| Mean | 113.172 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.8467127314 |
| Sum | 113172 |
| Variance | 353.8502663 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 108 | 31 | 3.1% |
| 100 | 28 | 2.8% |
| 117 | 27 | 2.7% |
| 110 | 26 | 2.6% |
| 106 | 26 | 2.6% |
| 118 | 26 | 2.6% |
| 102 | 25 | 2.5% |
| 112 | 24 | 2.4% |
| 104 | 23 | 2.3% |
| 123 | 23 | 2.3% |
| Other values (84) | 741 |
| Value | Count | Frequency (%) |
| 66 | 1 | 0.1% |
| 73 | 2 | 0.2% |
| 80 | 2 | 0.2% |
| 81 | 5 | |
| 82 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 191 | 1 | 0.1% |
| 187 | 1 | 0.1% |
| 180 | 3 | |
| 172 | 1 | 0.1% |
| 170 | 1 | 0.1% |
rating
Real number (ℝ≥0)
| Distinct | 55 |
|---|---|
| Distinct (%) | 5.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.7232 |
|---|---|
| Minimum | 1.9 |
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1.9 |
|---|---|
| 5-th percentile | 5.1 |
| Q1 | 6.2 |
| median | 6.8 |
| Q3 | 7.4 |
| 95-th percentile | 8.1 |
| Maximum | 9 |
| Range | 7.1 |
| Interquartile range (IQR) | 1.2 |
Descriptive statistics
| Standard deviation | 0.9454287893 |
|---|---|
| Coefficient of variation (CV) | 0.1406218451 |
| Kurtosis | 1.322270288 |
| Mean | 6.7232 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | -0.7431419408 |
| Sum | 6723.2 |
| Variance | 0.8938355956 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.1 | 52 | 5.2% |
| 6.7 | 48 | 4.8% |
| 7 | 46 | 4.6% |
| 6.3 | 44 | 4.4% |
| 6.6 | 42 | 4.2% |
| 7.2 | 42 | 4.2% |
| 7.3 | 42 | 4.2% |
| 6.5 | 40 | 4.0% |
| 7.8 | 40 | 4.0% |
| 6.2 | 37 | 3.7% |
| Other values (45) | 567 |
| Value | Count | Frequency (%) |
| 1.9 | 1 | |
| 2.7 | 2 | |
| 3.2 | 1 | |
| 3.5 | 2 | |
| 3.7 | 2 |
| Value | Count | Frequency (%) |
| 9 | 1 | 0.1% |
| 8.8 | 2 | 0.2% |
| 8.6 | 3 | |
| 8.5 | 6 | |
| 8.4 | 4 |
votes
Real number (ℝ≥0)
| Distinct | 997 |
|---|---|
| Distinct (%) | 99.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 169808.255 |
|---|---|
| Minimum | 61 |
| Maximum | 1791916 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 61 |
|---|---|
| 5-th percentile | 1260.35 |
| Q1 | 36309 |
| median | 110799 |
| Q3 | 239909.75 |
| 95-th percentile | 526551.85 |
| Maximum | 1791916 |
| Range | 1791855 |
| Interquartile range (IQR) | 203600.75 |
Descriptive statistics
| Standard deviation | 188762.6475 |
|---|---|
| Coefficient of variation (CV) | 1.111622327 |
| Kurtosis | 11.3126809 |
| Mean | 169808.255 |
| Median Absolute Deviation (MAD) | 88402 |
| Skewness | 2.507918483 |
| Sum | 169808255 |
| Variance | 3.56313371 × 1010 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1427 | 2 | 0.2% |
| 97141 | 2 | 0.2% |
| 291 | 2 | 0.2% |
| 531112 | 1 | 0.1% |
| 702 | 1 | 0.1% |
| 47804 | 1 | 0.1% |
| 226619 | 1 | 0.1% |
| 76469 | 1 | 0.1% |
| 125693 | 1 | 0.1% |
| 174553 | 1 | 0.1% |
| Other values (987) | 987 |
| Value | Count | Frequency (%) |
| 61 | 1 | |
| 96 | 1 | |
| 102 | 1 | |
| 115 | 1 | |
| 164 | 1 |
| Value | Count | Frequency (%) |
| 1791916 | 1 | |
| 1583625 | 1 | |
| 1222645 | 1 | |
| 1047747 | 1 | |
| 1045588 | 1 |
| Distinct | 814 |
|---|---|
| Distinct (%) | 93.3% |
| Missing | 128 |
| Missing (%) | 12.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 82.95637615 |
|---|---|
| Minimum | 0 |
| Maximum | 936.63 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.211 |
| Q1 | 13.27 |
| median | 47.985 |
| Q3 | 113.715 |
| 95-th percentile | 293.88 |
| Maximum | 936.63 |
| Range | 936.63 |
| Interquartile range (IQR) | 100.445 |
Descriptive statistics
| Standard deviation | 103.2535405 |
|---|---|
| Coefficient of variation (CV) | 1.244672746 |
| Kurtosis | 10.60763453 |
| Mean | 82.95637615 |
| Median Absolute Deviation (MAD) | 41.285 |
| Skewness | 2.592515866 |
| Sum | 72337.96 |
| Variance | 10661.29362 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.03 | 7 | 0.7% |
| 0.01 | 5 | 0.5% |
| 0.04 | 4 | 0.4% |
| 0.02 | 4 | 0.4% |
| 0.32 | 4 | 0.4% |
| 0.05 | 4 | 0.4% |
| 1.29 | 3 | 0.3% |
| 0.15 | 3 | 0.3% |
| 2.2 | 3 | 0.3% |
| 0.54 | 3 | 0.3% |
| Other values (804) | 832 | |
| (Missing) | 128 | 12.8% |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 0.01 | 5 | |
| 0.02 | 4 | |
| 0.03 | 7 | |
| 0.04 | 4 |
| Value | Count | Frequency (%) |
| 936.63 | 1 | |
| 760.51 | 1 | |
| 652.18 | 1 | |
| 623.28 | 1 | |
| 533.32 | 1 |
| Distinct | 84 |
|---|---|
| Distinct (%) | 9.0% |
| Missing | 64 |
| Missing (%) | 6.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.98504274 |
|---|---|
| Minimum | 11 |
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 31 |
| Q1 | 47 |
| median | 59.5 |
| Q3 | 72 |
| 95-th percentile | 85 |
| Maximum | 100 |
| Range | 89 |
| Interquartile range (IQR) | 25 |
Descriptive statistics
| Standard deviation | 17.19475702 |
|---|---|
| Coefficient of variation (CV) | 0.2915104614 |
| Kurtosis | -0.6122051468 |
| Mean | 58.98504274 |
| Median Absolute Deviation (MAD) | 12.5 |
| Skewness | -0.1238873467 |
| Sum | 55210 |
| Variance | 295.6596691 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 66 | 25 | 2.5% |
| 72 | 25 | 2.5% |
| 68 | 25 | 2.5% |
| 64 | 24 | 2.4% |
| 57 | 23 | 2.3% |
| 51 | 22 | 2.2% |
| 65 | 22 | 2.2% |
| 48 | 21 | 2.1% |
| 81 | 21 | 2.1% |
| 76 | 21 | 2.1% |
| Other values (74) | 707 | |
| (Missing) | 64 | 6.4% |
| Value | Count | Frequency (%) |
| 11 | 1 | 0.1% |
| 15 | 1 | 0.1% |
| 16 | 1 | 0.1% |
| 18 | 4 | |
| 19 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 100 | 1 | 0.1% |
| 99 | 1 | 0.1% |
| 98 | 1 | 0.1% |
| 96 | 4 | |
| 95 | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
| rank | title | genre | description | director | actors | year | runtime (minutes) | rating | votes | revenue (millions) | metascore | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Guardians of the Galaxy | Action,Adventure,Sci-Fi | A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe. | James Gunn | Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana | 2014 | 121 | 8.1 | 757074 | 333.13 | 76.0 |
| 1 | 2 | Prometheus | Adventure,Mystery,Sci-Fi | Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone. | Ridley Scott | Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron | 2012 | 124 | 7.0 | 485820 | 126.46 | 65.0 |
| 2 | 3 | Split | Horror,Thriller | Three girls are kidnapped by a man with a diagnosed 23 distinct personalities. They must try to escape before the apparent emergence of a frightful new 24th. | M. Night Shyamalan | James McAvoy, Anya Taylor-Joy, Haley Lu Richardson, Jessica Sula | 2016 | 117 | 7.3 | 157606 | 138.12 | 62.0 |
| 3 | 4 | Sing | Animation,Comedy,Family | In a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists' find that their lives will never be the same. | Christophe Lourdelet | Matthew McConaughey,Reese Witherspoon, Seth MacFarlane, Scarlett Johansson | 2016 | 108 | 7.2 | 60545 | 270.32 | 59.0 |
| 4 | 5 | Suicide Squad | Action,Adventure,Fantasy | A secret government agency recruits some of the most dangerous incarcerated super-villains to form a defensive task force. Their first mission: save the world from the apocalypse. | David Ayer | Will Smith, Jared Leto, Margot Robbie, Viola Davis | 2016 | 123 | 6.2 | 393727 | 325.02 | 40.0 |
| 5 | 6 | The Great Wall | Action,Adventure,Fantasy | European mercenaries searching for black powder become embroiled in the defense of the Great Wall of China against a horde of monstrous creatures. | Yimou Zhang | Matt Damon, Tian Jing, Willem Dafoe, Andy Lau | 2016 | 103 | 6.1 | 56036 | 45.13 | 42.0 |
| 6 | 7 | La La Land | Comedy,Drama,Music | A jazz pianist falls for an aspiring actress in Los Angeles. | Damien Chazelle | Ryan Gosling, Emma Stone, Rosemarie DeWitt, J.K. Simmons | 2016 | 128 | 8.3 | 258682 | 151.06 | 93.0 |
| 7 | 8 | Mindhorn | Comedy | A has-been actor best known for playing the title character in the 1980s detective series "Mindhorn" must work with the police when a serial killer says that he will only speak with Detective Mindhorn, whom he believes to be a real person. | Sean Foley | Essie Davis, Andrea Riseborough, Julian Barratt,Kenneth Branagh | 2016 | 89 | 6.4 | 2490 | NaN | 71.0 |
| 8 | 9 | The Lost City of Z | Action,Adventure,Biography | A true-life drama, centering on British explorer Col. Percival Fawcett, who disappeared while searching for a mysterious city in the Amazon in the 1920s. | James Gray | Charlie Hunnam, Robert Pattinson, Sienna Miller, Tom Holland | 2016 | 141 | 7.1 | 7188 | 8.01 | 78.0 |
| 9 | 10 | Passengers | Adventure,Drama,Romance | A spacecraft traveling to a distant colony planet and transporting thousands of people has a malfunction in its sleep chambers. As a result, two passengers are awakened 90 years early. | Morten Tyldum | Jennifer Lawrence, Chris Pratt, Michael Sheen,Laurence Fishburne | 2016 | 116 | 7.0 | 192177 | 100.01 | 41.0 |
Last rows
| rank | title | genre | description | director | actors | year | runtime (minutes) | rating | votes | revenue (millions) | metascore | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 990 | 991 | Underworld: Rise of the Lycans | Action,Adventure,Fantasy | An origins story centered on the centuries-old feud between the race of aristocratic vampires and their onetime slaves, the Lycans. | Patrick Tatopoulos | Rhona Mitra, Michael Sheen, Bill Nighy, Steven Mackintosh | 2009 | 92 | 6.6 | 129708 | 45.80 | 44.0 |
| 991 | 992 | Taare Zameen Par | Drama,Family,Music | An eight-year-old boy is thought to be a lazy trouble-maker, until the new art teacher has the patience and compassion to discover the real problem behind his struggles in school. | Aamir Khan | Darsheel Safary, Aamir Khan, Tanay Chheda, Sachet Engineer | 2007 | 165 | 8.5 | 102697 | 1.20 | 42.0 |
| 992 | 993 | Take Me Home Tonight | Comedy,Drama,Romance | Four years after graduation, an awkward high school genius uses his sister's boyfriend's Labor Day party as the perfect opportunity to make his move on his high school crush. | Michael Dowse | Topher Grace, Anna Faris, Dan Fogler, Teresa Palmer | 2011 | 97 | 6.3 | 45419 | 6.92 | NaN |
| 993 | 994 | Resident Evil: Afterlife | Action,Adventure,Horror | While still out to destroy the evil Umbrella Corporation, Alice joins a group of survivors living in a prison surrounded by the infected who also want to relocate to the mysterious but supposedly unharmed safe haven known only as Arcadia. | Paul W.S. Anderson | Milla Jovovich, Ali Larter, Wentworth Miller,Kim Coates | 2010 | 97 | 5.9 | 140900 | 60.13 | 37.0 |
| 994 | 995 | Project X | Comedy | 3 high school seniors throw a birthday party to make a name for themselves. As the night progresses, things spiral out of control as word of the party spreads. | Nima Nourizadeh | Thomas Mann, Oliver Cooper, Jonathan Daniel Brown, Dax Flame | 2012 | 88 | 6.7 | 164088 | 54.72 | 48.0 |
| 995 | 996 | Secret in Their Eyes | Crime,Drama,Mystery | A tight-knit team of rising investigators, along with their supervisor, is suddenly torn apart when they discover that one of their own teenage daughters has been brutally murdered. | Billy Ray | Chiwetel Ejiofor, Nicole Kidman, Julia Roberts, Dean Norris | 2015 | 111 | 6.2 | 27585 | NaN | 45.0 |
| 996 | 997 | Hostel: Part II | Horror | Three American college students studying abroad are lured to a Slovakian hostel, and discover the grim reality behind it. | Eli Roth | Lauren German, Heather Matarazzo, Bijou Phillips, Roger Bart | 2007 | 94 | 5.5 | 73152 | 17.54 | 46.0 |
| 997 | 998 | Step Up 2: The Streets | Drama,Music,Romance | Romantic sparks occur between two dance students from different backgrounds at the Maryland School of the Arts. | Jon M. Chu | Robert Hoffman, Briana Evigan, Cassie Ventura, Adam G. Sevani | 2008 | 98 | 6.2 | 70699 | 58.01 | 50.0 |
| 998 | 999 | Search Party | Adventure,Comedy | A pair of friends embark on a mission to reunite their pal with the woman he was going to marry. | Scot Armstrong | Adam Pally, T.J. Miller, Thomas Middleditch,Shannon Woodward | 2014 | 93 | 5.6 | 4881 | NaN | 22.0 |
| 999 | 1000 | Nine Lives | Comedy,Family,Fantasy | A stuffy businessman finds himself trapped inside the body of his family's cat. | Barry Sonnenfeld | Kevin Spacey, Jennifer Garner, Robbie Amell,Cheryl Hines | 2016 | 87 | 5.3 | 12435 | 19.64 | 11.0 |